Optimal time delay embedding for nonlinear time series modeling 
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When building linear or nonlinear models one is faced with the problem of selecting the best set 
of variable with which to predict the future dynamics. In nonlinear time series analysis the problem 
is to select the correct time delays in the time delay embedding. We propose a new technique which 
can quantify the suitability of a particular set of variables and we suggests a computationally efficient 
scheme to determine the best non-uniform time delay embedding for modeling of time series. Our 
results are based on the assumption that, in general, the variables which give the best local constant 
model will also give the best nonlinear model. In a wide variety of experimental and simulated 
systems we find that this method produces dynamics that are more realistic and predictions that 
are more accurate than standard uniform embeddings. 
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An autoregressive model predicts future evolution as 
a linear combination of past observations. An artificial 
neural network combines various "inputs" to predict un- 
known data. While the theory of modeling (both lin- 
ear and nonlinear) is well developed there is no general 
method to choose the correct variables ( "inputs" or "ob- 
servations") with which to predict future dynamics. In 
this communication we propose a quantitative criterion 
which may be used to assess the relative merit of vari- 
ous combinations of variables. To illustrate this concept 
we consider the specific problem of time series predic- 
tion when only a scalar time series is available and one 
reconstructs the underlying dynamics with a time delay 
embedding. The generalization of this method to other 
situations, such as multivariate time series, is obvious. 

Very often, a high dimensional physical system is only 
observable through a singk scalar variable. The method 
of time delay embedding [l| is widely applied to estimate 
the evolution of the underlying vector field. From a scalar 
time series {a;*}^^ of N observations one reconstructs a 
vector time series with evolution topologically equivalent 
to the original system via the transformation 

Even in the restricted field of time delay embedding, 
there is no general method to select the best group of 
variables (Q) . Several authors have argued that the criti- 
cal parameter is the product deT 0], but, in general one 
must estimate both the embedding dimension and the 
embedding lag r. Embedding dimension is usually esti- 
mated via the application of geometric methods such as 
false nearest neighbors (3| and embedding lag is related 
to underlying time-scales (such as pseudo-periodicity) in 
the time series 0. However, most of these approaches 
are motivated by the objective of accurately estimating 
dynamic invariants. In (21 we see that which embedding 
criterion is judged to be best will depend on the adjudi- 
cating criterion. 

Furthermore, there is no reason to suppose that a uni- 
form embedding, such as J^lj is the correct approach 



In general, the problem of estimating the optimum 
embedding should be restated as: find the parameters 
= l,...fc} and the embedding window dw, where 
1 < < di < £i+i < ik < dw and the time delay embed- 
ding 

Xt {xt-i^,xt-i^,xt-i3, ■ . -xt-i^) (2) 

is somehow the "best" . For nonlinear time series mod- 
eling, embedding strategies such as this were introduced 
by Judd and Mees Q and are described as non-uniform 
embeddings. This problem is now a special case of the 
more general problem of selecting the best set of vari- 
ables ("inputs") to model (for example, via an artificial 
neural network) some unknown quantity. 

Unfortunately, application of equation jSJ makes the 
problem of selecting embedding parameters considerably 
more complicated. In this paper we propose a suitable 
criterion for quantitatively comparing embedding strate- 
gies and describing an efficient scheme for the computa- 
tion of = 1, . . . , A:} and k. We apply this method to 
several experimental and simulated time series and show 
that the non-uniform embedding strategy has many ad- 
vantages over the standard techniques Non-Uniform 
embedding strategies usually utilise a smaller embedding 
window and provide better nonlinear predictions (small 
mean prediction error). We find that employing a non- 
uniform embedding strategy allows one to simulate com- 
plex nonlinear dynamics that are qualitatively more like 
the true system (or in the case of experimental data, sim- 
ply more plausible). 

Often, the purpose of time delay embedding is to esti- 
mate correlation dimension or other dynamic invari- 
ants Q. In such situations, embeddings such as ^ are 
usually adequate. In this work we focus on the problem 
of estimating the underlying evolution operator of the 
dynamical system from a single scalar observable. We 
are therefore interested in obtaining the most accurate 
prediction of the observed data values. By doing so we 
hope to capture the long term dynamics of the underlying 
system. To achieve this we adopt the information the- 
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Lorenz 
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TABLE I: Embedding parameters for the various data considered in this paper: data length (A''), physical sampling rate (/), 
data "pseudo-period" estimated as mean cycle time (T), embedding dimension computed with the method of false nearest 
neighbors (de), embedding lag estimated by the first zero of autocorrelation (r), non- uniform embedding window computed 
with the method described in _8] (dw), and the non-uniform set of embedding lags (such that (xt-ii, ■ ■ ■ jXt-e^) is used to 
predict Xt). With the exception of the Lorenz system, r is approximately one-quarter of the pseudo-period of the time series. 



oretic measure description length and seek to choose 
the embedding which provides the minimum description 
length. This method can be applied equally to a variety 
of other modeling regimes. 

Roughly speaking the description length of a time se- 
ries is the compression of the finite precision data af- 
forded by the model of that data . If a model is poor 
then it will be more economical to simply describe the 
model prediction errors. Conversely, if a model fits the 
data well, then the description of that model and the 
(presumably small) model prediction errors will be more 
compact. However, if a model over-fits the data ^3 then 
the description of the model itself will be too large. In 
we showed that the description length DL{-) of a time 
series {xt} is approximated by 
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d + DL{d) + DL{x) + DL{V). (3) 
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E(xt) is the mean of the 



where d = max^ 

data, {et}t = d + l" are the model prediction errors, and 
DL{V) is the description length of the model parameters. 
The description length of an integer d can be shown to be 
DL{d) = [logd] -I- [log [logd]] -I- . . . where each term on 
the right is an integer and the last term in the series is 
0. Furthermore, § {1 + ln2n)+DL{x) is independent of 
the embedding strategy. Hence, the optimal embedding 
strategy is that which minimizes 



2 



N -d 



1 

-^(a;, -x) 

— t 



N ~d 



d + DL{d) + 

^DL{V). (4) 



i=d+l 

The first three terms in Q may be computed directly. 



However, the last two terms require one to estimate the 
optimal model. 

As in Ql, for the purposes of computational expedi- 
ency, we restrict ourselves to the class of local constant 
models. In the current context this is not unreasonable as 
we hope to obtain an embedding which spreads the data 
in phase space based on the deterministic dynamic evolu- 
tion. Under this assumption, DL{V) = and the model 

prediction error j^z^d Tl!i=d+i ^1 computed via 

"drop-one-out" interpolation. That is, e^+i — Xi+i—Xj+i 
where j G {1, 2, . . . , iV}\{i} is such that ||a:i — is mini- 
mal. Note that, in the limit as iV — > oo (i.e. N ^ d) opti- 
mizing is equivalent to finding the embedding which 
provides the best prediction (the last two terms of Q 
dominate) . 

To minimize equation Q we assume that the maxi- 
mum number of inputs, d, has already been calculated. 
One may choose d = d^, the embedding window com- 
puted using the method described in Alternatively, 
one may either assign an arbitrary value for d or use 
d = df,T where both dg and r are estimated by one of the 
many standard techniques. The technique suggested in 
gives an upper bound on the embedding window dyj. 
But the method offered in provides no way of choos- 
ing the optimal set of embedding lags. In fact, the main 
conclusion of Q is that estimating should be done 
prior to modeling, but estimating the actual embedding 
lags should be considered part of the modeling process. 
In this paper we apply the computational procedure de- 
scribed below to select the embedding that optimizes 
This solves the main problem raised by For the 
numerical simulations presented here, we choose d such 
that d> dw 

An exhaustive search on the 2'^ possible embedding 
strategies is only feasible for small d. For large d (i.e. 
d > 10) we utilise a genetic algorithm to determine the 
optimum embedding strategies. Furthermore, to reduce 
the computational effort in estimating the model pre- 
diction error for large N {N > 1000) we minimize the 
prediction error only on a randomly selected subset of 
the data. Our calculations show that neither of these ap- 
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est. 


median 


mean 


median 
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Sunspots 


2.09 


3.04±4.95 


2.24 


2.19±0.413 


1.89 


VF 


1.46 


1.46±0.0486 


1.56 


1.56±0.0387 


1.62 


Laser 


2.21 


2.20±0.116 


2.48 


2.45±0.190 


2.13 


Rossler 


1.31 


1.32±0.128 


1.34 


1.34±0.0969 


1.59 


R.+ noise 


1.96 


1.94±0.135 


1.89 


1.86±0.141 


1.82 


Lorenz 


1.98 


1.98±0.0789 


1.86 


1.86±0.0968 


1.97 


L.+ noise 


1.64 


1.64±0.0743 


1.64 


1.63±0.0576 


1.61 



TABLE IL Comparison of correlation dimension estimates for 
the data and local constant model simulations (as described 
in lUl) using either the standard uniform or non-uniform em- 
bedding strategy. We computed 30 simulations with either 
embedding strategy for each data set and report here the me- 
dian, mean and standard deviation of the correlation dimen- 
sion estimates (computed with the values de and r reported in 
tablePl. For reference the value of correlation dimension esti- 
mated from the time series data is also provided. All results 
are rounded to 3 significant figures. 



proximations adversely affect our results. The results of 
the genetic algorithm are robust and accurate. Further- 
more, we find that provided the data subset is selected 
with replacement and that it is moderately large, the fi- 
nal solution is independent of the specific subset selected. 
Our choice of genetic algorithm (over alternative optimi- 
sation techniques) is arbitrary. Other techniques (such as 
simulated annealing) may also perform well, but remain 
untested. 

We tested this algorithm with data from three experi- 
mental systems (th e famous annual sunspot time series, 
a chaotic laser jl2j . and a recording of human electro- 
cardiogram during ventricular fibrillation (VF) [T^l. and 
two computational simulations (Rossler and Lorenz equa- 
tions) both with and without the addition of Gaussian 
noise with a standard deviation of 5% that of the data. 
For each data set we estimated the embedding window 
dw the embedding dimension de (via false nearest 
neighbors) and the embedding lag r (using the first zero 
of the autocorrelation) . The results of these calculation 
together with the non-uniform embedding strategy esti- 
mated using the methods proposed here are reported in 
Table D 

For each of these systems we estimated the best non- 
uniform embedding strategy using the Genetic Algorithm 
and (where necessary) the sub-sample selection scheme 
30 times. All the data sets except the longest (the EGG 
recording and the laser system) produced identical results 
on repeated execution. For the two longest data sets, 
the most often observed embedding strategy was also the 
best (indicating that the sub-sample selection scheme is 
expedient but perhaps not always accurate). Tabled also 
illustrates that, in most cases the non- uniform embed- 
ding covered a smaller range of embedding lags than the 



standard method (i.e. £k < der) and is often of lower di- 
mension (fc < de). Perhaps intuitively, noisier time series 
required larger k. Furthermore, we note that in none of 
the cases was the best non-uniform embedding strategy 
actually uniform. 

To test how good these non-uniform embedding strate- 
gies are at modeling the underlying dynamics, we ap- 
ply two distinct modeling schemes. For each scheme we 
compare the results obtained with both the uniform and 
non-uniform embedding strategy. For either modeling 
scheme we simulate trajectories on the underlying de- 
terministic dynamical system in the presence of noise. 
These random trajectories are iterates of the determin- 
istic model with the addition of random perturbations 
(with expected variance less than the model mean square 
error) added to the prediction at each step. In other 
words, if 

where is a deterministic map model with model predic- 
tion error et+i, then the random trajectory yt is obtained 
from 

Vt+i = F{yt-i^,yt-i^,yt-i3,, ■ ■ ■),yt^it) = Vt+i + et+i 

where j/o — (for some j selected at random) and ~ 
7V(0,a2) (a2<i?(e?)). 

The first modeling scheme is essentially iterated "drop- 
one-out" constant interpolation as described in jl4]. By 
construction, the non-uniform embedding strategy will 
have the optimal short term prediction. However, in Ta- 
ble |n] we test how well this strategy captures the long 
term dynamics. For each time series and each embed- 
ding strategy we compute 30 random trajectories and 
compare the correlation dimension estimates 0. Corre- 
lation dimension estimates are used here as a quantita- 
tive comparison, we do not claim that it is an accurate 
(or even unbiased) estimate of the attractor's true corre- 
lation dimension. Table ^ shows that both embedding 
strategies perform fairly well, with the non-uniform em- 
bedding strategy performing significantly better for the 
short or noisy data sets. In most other cases the dif- 
ference is not significant. In no instances did the non- 
uniform embedding strategy perform significantly worse 
or fail to capture the dynamics (the large variance for the 
uniform embedding strategy indicates that it often failed 
to accurately capture the dynamics of the sunspots time 
series). 

The second modeling scheme is more sophisticated and 
is an attempt to genuinely estimate the underlying deter- 
ministic dynamics of the system (this is not possible from 
a local constant method, despite the admirable results of 
Table For each data set we compare the uniform 
embedding strategy to the non-uniform embedding 
strategy Q by constructing nonlinear models using the 
method described in 0. These models are radial ba- 
sis models with the number of radial basis functions de- 
termined according to the minimum description length 
principle. 
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model size 


prediction error 


correlation dimension 


data 


uniform 


non- uniform 


uniform 


non-uniform 


uniform 


non-uniform 


data 


Sunspots 


2.07±0.828 


2.83±0.834 


0.472±0.0637 


0.44±0.0523 


1.21±0.849 


1.15±0.614 


1.89 


VF 


6±1.26 


8.6±1.79 


0.264±0.00335 


0.254±0.00401 


1.01±0.492 


1.02±0.583 


1.62 


Laser 


19.7±3.59 


20.3±4.21 


0.179±0.0166 


0.194±0.0223 


1.06±0.771 


1.44±0.727 


2.13 


Rossler 


15.8±2.73 


13.9±2.26 


0.0353±0.00709 


0.0395±0.00686 


0.999±0.548 


1.29±0.608 


1.59 


R.+noise 


7.53±1.48 


6.4±0.675 


0.0996±0.00655 


0.103±0.00698 


1.16±0.506 


1.11±0.377 


1.82 


Lorenz 


6.77±1.04 


12.7±3.22 


0.131±0.0189 


0.0597±0.00918 


0.175±0.215 


0.989±0.281 


1.97 


L.+noise 


6.3±0.988 


6.67±1.09 


0.159±0.00814 


0.109±0.00861 


0.122±0.286 


1.03±0.311 


1.61 



TABLE IIL Comparison of modeling results for the uniform (de and r) and non- uniform {li, . . . ik) embedding parameters listed 
in Table H] For each embedding strategy we constructed 30 nonlinear models, with minimum description length as a selection 
criterion and computed the average number of model parameters and the average out-of-sample iterated model prediction 
error. For each model we also computed the mean correlation dimension estimate (computed with the values and r listed 
earlier) for 30 simulations (different initial conditions) and report the median value over all models. For reference the value of 
correlation dimension estimated from the time series data is also provided. All results are rounded to 3 significant figures. 



For each data set we computed 30 nonlinear models 
with either embedding strategy and report in Table IIIII 
the average model size (the number of basis functions 
in the best radial basis model) and normalized out-of- 
sample iterated mean model prediction error for the min- 
imum description length best model (repeated modeling 
attempts are required because this highly nonlinear fit- 
ting procedure is stochastic). Furthermore, for each of 
the 30 models we generate 30 random trajectories of N 
time steps. For each of these iterated predictions we 
computed correlation dimension using the technique de- 
scribed in li]. Table Hn also compares the correlation 
dimension of the data to the simulations with either mod- 
eling scheme. In general we observe that the non-uniform 
embedding scheme affords larger models with smaller 
prediction errors and correlation dimensions closer to 
that of the true data. That is, both the qualitative and 
quantitative dynamics are reproduced much better with 
these non-uniform embedding strategies. 

Time delay embedding is a fundamental technique for 
the reconstruction of nonlinear dynamical systems from 
time series. It is commonly applied to time series data 
and almost ubiquitously via estimation of de and r and 
applying the transformation We have argued (based 
on the work of other authors) that this approach is not 
optimal, and that in general one should apply a non- 
uniform embedding such as Currently there is no 
generic method for choosing the best embedding strategy 
from among all possible non-uniform embeddings. The 
main problem is that one must have a quantitative and 
easily computable measure of the comparative suitabil- 
ity of competing embedding strategies (0) . Motivated by 



information theoretic concerns, we propose a simple es- 
timate of the "goodness" of embedding strategies based 
primarily on the nonlinear prediction error of a local con- 
stant model (0}. We find that it is necessary to augment 
this with a combination of a genetic algorithm and sub- 
sample selection scheme. 

After considering a wide variety of experimental and 
simulated time series we conclude that this method pro- 
vides alternative embedding strategies which are often 
smaller {k < de and £k < deT) and perform at least as 
well, but in general significantly better than, standard 
techniques. We have applied correlation dimension as 
a quantitative measure of the accuracy of dynamic re- 
construction and find that the non-uniform embedding 
strategy described here produces models which behave 
more like the true data. 

This embedding lag selection scheme provides a 
method to choose good embedding strategies for time 
delay embeddings. A straightforward extension of this 
idea will also allow one to select variables for more gen- 
eral multivariate problems. Obvious examples are in the 
selection of optimal inputs for artificial neural networks 
and for testing dependency among physical variables. 
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